多模态序列推荐场景下融合多维感知的自蒸馏多任务学习

doi:10.16451/j.cnki.issn1003-6059.202508001

摘要
图/表
参考文献
相关文章 (7)

全文: PDF (838 KB) HTML (1 KB)
输出: BibTeX | EndNote (RIS)

摘要作为推荐系统的一个重要应用场景,多模态序列推荐已成为当前工业界与学术界研究的焦点之一.然而,现有面向多模态序列推荐的多任务学习方法未充分考虑模态内部的高阶关系及短期序列的增强作用,在语义表达和兴趣表征学习方面能力有限,导致个性化程度不高.因此,文中提出多模态序列推荐场景下融合多维感知的自蒸馏多任务学习模型(Self-Distillation Multi-task Learning Integrating Multi-dimensional Perception for Multimodal Sequential Recommendation, SD-MTMP).首先,在对用户评论进行主题提取的基础上,构建用户-主题超图和项目-主题超图,分别建模用户群体与项目集合内部的高阶语义关联,生成主题感知的节点表征,并基于用户-项目评分矩阵构建加权二部图,生成评分感知的节点表征.然后,设计跨模态自蒸馏辅助任务,通过主题感知表征向评分感知表征的知识迁移实现语义对齐.同时,综合考虑用户评分与时间间隔对短期序列的影响,建立双重感知注意力机制,精准建模用户的短期兴趣.在此基础上,提出适用于多模态序列推荐的多任务学习策略,通过推荐损失与自蒸馏损失的联合优化,进一步增强表征语义,提升推荐性能.最后,在3个公开数据集上的实验表明SD-MTMP的有效性.

	服务

	把本文推荐给朋友
	加入我的书架
	加入引用管理器
	E-mail Alert
	RSS
	作者相关文章
	汤哲
	庞继芳
	解宇
	王智强

关键词 ：多任务学习, 多模态序列推荐, 知识迁移, 双重感知注意力, 超图

Abstract：As an important application scenario of recommendation systems, multimodal sequential recommendation is a research focus in both industry and academia. However, existing multi-task learning approaches for multimodal sequential recommendation fail to fully consider the high-order relationships within modalities and the enhanced effect of short-term sequences of users. Consequently, these approaches exhibit a low degree of personalization due to their weak semantic representations and interest modeling. To address this issue, an approach for self-distillation multi-task learning integrating multi-dimensional perception for multimodal sequential recommendation(SD-MTMP) is proposed. First, based on the extraction of topics from user reviews, high-order semantic correlations in user groups and item collections are modeled respectively by constructing user-topic and item-topic hypergraphs. The topic-aware representations of nodes are generated through hypergraph convolution. Simultaneously, a weighted bipartite graph is built based on the user-item rating matrix to generate rating-aware representations of nodes. Second, a cross-modal self-distillation auxiliary task is designed to achieve semantic alignment by transferring knowledge from topic-aware representations to rating-aware representations. Additionally, a dual-aware attention mechanism is established by comprehensively considering the effects of user ratings and time intervals on short-term sequences to accurately model short-term interests of users. On the basis of the above, a multi-task learning strategy is proposed for multimodal sequential recommendation. It jointly optimizes the recommendation loss and the self-distillation loss, thereby further enhancing the semantic expressiveness of representations and improving recommendation performance. Finally, experiments on three public datasets demonstrate the effectiveness of SD-MTMP.

Key words： Multi-task Learning Multimodal Sequential Recommendation Knowledge Transfer Dual-Aware Attention Hypergraph

收稿日期: 2025-07-16

ZTFLH:

TP181

基金资助:国家自然科学基金项目(No.62472270,62272285,72171137)、山西省基础研究计划项目(No.202403021221021)资助

通讯作者: 庞继芳,博士,副教授,主要研究方向为推荐系统、智能决策.E-mail:purplepjf@sxu.edu.cn.

作者简介: 汤哲,硕士研究生,主要研究方向为推荐系统.E-mail:tangzhe@sxu.edu.cn.
解宇,博士,副教授,主要研究方向为机器学习.E-mail:yuxie@sxu.edu.cn.
王智强,博士,副教授,主要研究方向为机器学习、数据挖掘、网络大数据分析.E-mail:wangzq@sxu.edu.cn.

引用本文:

汤哲, 庞继芳, 解宇, 王智强. 多模态序列推荐场景下融合多维感知的自蒸馏多任务学习[J]. 模式识别与人工智能, 2025, 38(8): 669-683. TANG Zhe, PANG Jifang, XIE Yu, WANG Zhiqiang. Self-Distillation Multi-task Learning Integrating Multi-dimensional Perception for Multimodal Sequential Recommendation. Pattern Recognition and Artificial Intelligence, 2025, 38(8): 669-683.

链接本文:

http://manu46.magtech.com.cn/Jweb_prai/CN/10.16451/j.cnki.issn1003-6059.202508001 或 http://manu46.magtech.com.cn/Jweb_prai/CN/Y2025/V38/I8/669

[1] ZHANG S X, LIU Z T, XU Y, et al. A Physics-Informed Hybrid Multitask Learning for Lithium-Ion Battery Full-Life Aging Estimation at Early Lifetime. IEEE Transactions on Industrial Informatics, 2025, 21(1): 415-424.
[2] JIANG S, ZHU G H, WANG Y, et al. Automatic Multi-task Lear-ning Framework with Neural Architecture Search in Recommendations // Proc of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. New York, USA: ACM, 2024: 1290-1300.
[3] ZHANG X K, XU B, WU Y L, et al. FineRec: Exploring Fine-Grained Sequential Recommendation // Proc of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval. New York, USA: ACM, 2024: 1599-1608.
[4] ZHANG C, HAN Q L, CHEN R, et al. SSDRec: Self-Augmented Sequence Denoising for Sequential Recommendation // Proc of the IEEE 40th International Conference on Data Engineering. Washington, USA: IEEE, 2024: 803-815.
[5] ZHANG D, GENG Y L, GONG W W, et al. RecDCL: Dual Con-trastive Learning for Recommendation // Proc of the ACM Web Conference. New York, USA: ACM, 2024: 3655-3666.
[6] HIDASI B, KARATZOGLOU A, BALTRUNAS L, et al. Session-Based Recommendations with Recurrent Neural Networks[C/OL].[2025-06-21]. https://arxiv.org/pdf/1511.06939.
[7] KANG W C, MCAULEY J. Self-Attentive Sequential Recommendation // Proc of the IEEE International Conference on Data Mining. Washington, USA: IEEE, 2018: 197-206.
[8] ZHANG M Q, WU S, YU X L, et al. Dynamic Graph Neural Networks for Sequential Recommendation. IEEE Transactions on Know-ledge and Data Engineering, 2023, 35(5): 4741-4753.
[9] WU S, TANG Y Y, ZHU Y Q, et al. Session-Based Recommendation with Graph Neural Networks. Proceedings of the AAAI Confe-rence on Artificial Intelligence, 2019, 33(1): 346-353.
[10] DING C X, ZHAO Z Y, LI C, et al. Session-Based Recommendation with Hypergraph Convolutional Networks and Sequential Information Embeddings. Expert Systems with Applications, 2023. DOI: 10.1016/j.eswa.2023.119875.
[11] FU C, WANG K, WU J H, et al. Residual Multi-task Learner for Applied Ranking // Proc of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. New York, USA: ACM, 2024: 4974-4985.
[12] NI Y B, OU D, LIU S C, et al. Perceive Your Users in Depth: Learning Universal User Representations from Multiple E-Co-mmerce Tasks // Proc of the 24th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. New York, USA: ACM, 2018: 596-605.
[13] ZHAO J J, DU B W, SUN L L, et al. Multiple Relational Attention Network for Multi-task Learning // Proc of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York, USA: ACM, 2019: 1123-1131.
[14] MA X, ZHAO L Q, HUANG G, et al. Entire Space Multi-task Model: An Effective Approach for Estimating Post-Click Conversion Rate // Proc of the 41st International ACM SIGIR Conference on Research and Development in Information Retrieval. New York, USA: ACM, 2018: 1137-1140.
[15] WEN H, ZHANG J, WANG Y, et al. Entire Space Multi-task Modeling via Post-Click Behavior Decomposition for Conversion Rate Prediction // Proc of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval. New York, USA: ACM, 2020: 2377-2386.
[16] 周俊,胡斌斌,张志强,等. MoGE:基于图上下文增强的多任务推荐算法. 电子学报, 2023, 51(11): 3377-3387.
(ZHOU J, HU B B, ZHANG Z Q, et al. MoGE: Graph Context Enhanced Multi-task Recommendation Method. Acta Electronica Sinica, 2023, 51(11): 3377-3387.)
[17] HE Y, FENG X, CHENG C, et al. MetaBalance: Improving Multi-task Recommendations via Adapting Gradient Magnitudes of Auxi-liary Tasks // Proc of the ACM Web Conference. New York, USA: ACM, 2022: 2205-2215.
[18] LIU Y X, XIA L H, HUANG C, et al. SelfGNN: Self-Supervised Graph Neural Networks for Sequential Recommendation // Proc of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval. New York, USA: ACM, 2024: 1609-1618.
[19] XIE X, SUN F, LIU Z Y, et al. Contrastive Learning for Sequential Recommendation // Proc of the IEEE 38th International Conference on Data Engineering. Washington, USA: IEEE, 2022: 1259-1273.
[20] WU J N, WANG X, FENG F L, et al. Self-Supervised Graph Lear-ning for Recommendation // Proc of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval. New York, USA: ACM, 2021: 726-735.
[21] LIU J X, CHEN S C. TimesURL: Self-Supervised Contrastive Lear-ning for Universal Time Series Representation Learning. Proceedings of the AAAI Conference on Artificial Intelligence, 2024, 38(12): 13918-13926.
[22] FU J C, GE X R, XIN X, et al. IISAN: Efficiently Adapting Multimodal Representation for Sequential Recommendation with Decoupled Peft // Proc of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval. New York, USA: ACM, 2024: 687-697.
[23] 张晓明,梁正光,姚昌瑀,等. 融合潜在结构与语义信息的多模态推荐方法. 模式识别与人工智能, 2024, 37(3): 231-241.
(ZHANG X M, LIANG Z G, YAO C Y, et al. Multimodal Re-commendation Method Integrating Latent Structures and Semantic Information. Pattern Recognition and Artificial Intelligence, 2024, 37(3): 231-241.)
[24] CHEN G D, SUN R N, JIANG Y Z H, et al. A Multi-modal Mo-deling Framework for Cold-Start Short-Video Recommendation // Proc of the 18th ACM Conference on Recommender Systems. New York, USA: ACM, 2024: 391-400.
[25] GUO Z Q, LI J J, LI G H, et al. LGMRec: Local and Global Graph Learning for Multimodal Recommendation. Proceedings of the AAAI Conference on Artificial Intelligence, 2024, 38(8): 8454-8462.
[26] LU J S, BATRA D, PARIKH D, et al. ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks // Proc of the 33rd International Conference on Neural Information Processing Systems. Cambridge, USA: MIT Press, 2019: 13-23.
[27] YU P H, TAN Z Y, LU G M, et al. Multi-view Graph Convolution Network for Multimedia Recommendation // Proc of the 31st ACM International Conference on Multimedia. New York, USA: ACM, 2023: 6576-6585.
[28] HU H C, GUO W, LIU Y, et al. Adaptive Multi-modalities Fusion in Sequential Recommendation Systems // Proc of the 32nd ACM International Conference on Information and Knowledge Management. New York, USA: ACM, 2023: 843-853.
[29] SUN Z Y, FANG Y, WU T, et al. Alpha-CLIP: A CLIP Model Focusing on Wherever You Want // Proc of the IEEE/CVF Confe-rence on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2024: 13019-13029.
[30] 张凯涵,冯晨娇,姚凯旋,等. 基于对比学习和语义增强的多模态推荐算法. 模式识别与人工智能, 2024, 37(6): 479-490.
(ZHANG K H, FENG C J, YAO K X, et al. Multimodal Reco-mmendation Algorithm Based on Contrastive Learning and Semantic Enhancement. Pattern Recognition and Artificial Intelligence, 2024, 37(6): 479-490.)
[31] SHEN Z Q, LIU Z C, QIN J, et al. S²-BNN: Bridging the Gap between Self-Supervised Real and 1-Bit Neural Networks via Guided Distribution Calibration // Proc of the IEEE/CVF Confe-rence on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2021: 2165-2174.
[32] LIU Q D, WU X, WANG Y J, et al. LLM-ESR: Large Language Models Enhancement for Long-Tailed Sequential Recommendation // Proc of the 38th International Conference on Neural Information Processing Systems. Cambridge, USA: MIT Press, 2024: 26701-26727.
[33] SHUAI J, WU L, ZHANG K, et al. Topic-Enhanced Graph Neural Networks for Extraction-Based Explainable Recommendation // Proc of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval. New York, USA: ACM, 2023: 1188-1197.
[34] YANG W, HUO T F, LIU Z Q, et al. Review-Based Multi-intention Contrastive Learning for Recommendation // Proc of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval. New York, USA: ACM, 2023: 2339-2343.
[35] MCINNES L, HEALY J, MELVILLE J. UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction[C/OL]. [2025-06-21].https://arxiv.org/abs/1802.03426.
[36] MCINNES L, HEALY J, ASTELS S. HDBSCAN: Hierarchical Density Based Clustering. The Journal of Open Source Software, 2017, 2(11). DOI: 10.21105/joss.00205.
[37] WANG X, HE X N, WANG M, et al. Neural Graph Collaborative Filtering // Proc of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval. New York, USA: ACM, 2019: 165-174.
[38] CHEN L H, YANG N, YU P S, et al. Time Lag Aware Sequential Recommendation // Proc of the 31st ACM International Conference on Information and Knowledge Management. New York, USA: ACM, 2022: 212-221.
[39] GARG D, GUPTA P, MALHOTRA P, et al. Sequence and Time Aware Neighborhood for Session-Based Recommendations: STAN // Proc of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval. New York, USA: ACM, 2019: 1069-1072.
[40] ZHOU X, SUN A X, LIU Y, et al. SelfCF: A Simple Framework for Self-Supervised Collaborative Filtering. ACM Transactions on Recommender Systems, 2023, 1(2): 1-25.
[41] YU J L, XIA X, CHEN T, et al. XSimGCL: Towards Extremely Simple Graph Contrastive Learning for Recommendation. IEEE Transactions on Knowledge and Data Engineering, 2024, 36(2): 913-926.
[42] WANG J P, ZENG Z Y, WANG Y X, et al. MISSRec: Pre-trai-ning and Transferring Multi-modal Interest-Aware Sequence Representation for Recommendation // Proc of the 31st ACM Internatio-nal Conference on Multimedia. New York, USA: ACM, 2023: 6548-6557.
[43] SHUAI J, ZHANG K, WU L, et al. A Review-Aware Graph Con-trastive Learning Framework for Recommendation // Proc of the 45th International ACM SIGIR Conference on Research and Deve-lopment in Information Retrieval. New York, USA: ACM, 2022: 1283-1293.
[44] XIONG Y Q, LIU Y Z, QIAN Y, et al. Review-Based Recommendation under Preference Uncertainty: An Asymmetric Deep Lear-ning Framework. European Journal of Operational Research, 2024, 316(3): 1044-1057.